It’s a common use case when working with geo-spatial data to augment one spatial dataset with information from another spatial dataset – this can be done via spatial joins. In this post for example, I will show how to augment a dataset containing school locations with socioeconomic data of their surrounding statistical region. This approach has the drawback that the surrounding statistical region doesn’t reflect the actual catchment area of the school. I will present an approach based on intersection with Voronoi regions which should more accurately represent the socioeconomic characteristics of the population around a school.
For this example, I’d like to compare child poverty (more accurately the percentage of children whose parents obtain social welfare) in the neighborhood regions around public and private primary schools in Berlin. This blog post concentrates on how to join the point samples (the schools) with the surrounding statistical regions, so I will present only a few summary statistics in the end since proper spatial modeling is beyond the scope of this blog post.
We will work with three datasets: The first spatial dataset contains the shape of the statistical regions in Berlin, the second dataset contains the socioeconomic data for these regions and the third dataset contains the locations and other attributes of primary schools in Berlin.
All data and the code are available in the GitHub repository. We will use the sf package for working with spatial data in R, dplyr for data management and ggplot2 for a few more advanced visualizations, i.e. when base plot() is not sufficient:
library(sf)
library(dplyr)
library(ggplot2)
We will at first load a dataset with the most granular statistical regions for Berlin, called Planungsräume (planning areas). We select the area ID and name as spatial attributes. The result is a spatial dataframe (a simple feature (sf) collection).
bln_plan <- read_sf('data/berlin_plr.shp') %>%
mutate(areaid = as.integer(SCHLUESSEL)) %>% # transform character SCHLUESSEL to numeric area ID
select(areaid, name = PLR_NAME)
bln_plan
## Simple feature collection with 447 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 370000.7 ymin: 5799520 xmax: 415785.3 ymax: 5837259
## Projected CRS: ETRS89 / UTM zone 33N
## # A tibble: 447 x 3
## areaid name geometry
## <int> <chr> <MULTIPOLYGON [m]>
## 1 1011101 Stülerstr. (((387256.6 5818552, 387323.1 5818572, 387418.9 5818…
## 2 1011102 Großer Tiergar… (((386767.5 5819393, 386768.3 5819389, 386769.6 5819…
## 3 1011103 Lützowstr. (((387952.6 5818275, 387986.7 5818313, 387994.6 5818…
## 4 1011104 Körnerstr. (((388847.1 5817875, 388855.5 5817899, 388865.1 5817…
## 5 1011105 Nördlicher Lan… (((388129.5 5819015, 388157.1 5819017, 388170.8 5819…
## 6 1011201 Wilhelmstr. (((389845.7 5819286, 389840.9 5819311, 389846.1 5819…
## 7 1011202 Unter den Lind… (((389851.2 5819673, 389847.5 5819682, 389846.3 5819…
## 8 1011203 Linden Süd (((390147.6 5819688, 390307.9 5819702, 390458.4 5819…
## 9 1011204 Leipziger Str. (((390523.4 5819192, 390705.4 5819207, 390727.2 5819…
## 10 1011301 Charitéviertel (((389526.5 5820432, 389528.2 5820499, 389528.9 5820…
## # … with 437 more rows
When printing this dataframe, the header reveals another important information: The coordinate reference system (CRS) of this dataset is ETRS89 / UTM zone 33N. We will later need to make sure that the coordinates of the school locations and the coordinates of the planning areas use the same coordinate system.
This data can be joined with socioeconomic information provided from official sources. Luckily, Helbig/Salomo 2021 compiled these information for many cities in Germany (available for download) among which is data for Berlin from 2020. I’ve created an excerpt with percentages of residents receiving social welfare (welfare) and percentage of children under 15 years whose parents receive social welfare (welfare_chld):
(bln_welfare <- read.csv('data/berlin_welfare.csv', stringsAsFactors = FALSE))
## areaid areaname welfare welfare_chld
## 1 1011101 Stülerstraße 10.09 15.44
## 2 1011102 Großer Tiergarten 4.76 0.00
## 3 1011103 Lützowstraße 22.21 36.80
## 4 1011104 Körnerstraße 24.81 42.14
## 5 1011105 Nördlicher Landwehrkanal 2.82 3.53
## 6 1011201 Wilhelmstraße 12.13 19.03
## 7 1011202 Unter den Linden Nord 3.35 11.11
## 8 1011203 Unter den Linden Süd 6.59 6.25
## 9 1011204 Leipziger Straße 10.28 19.91
## 10 1011301 Charitéviertel 3.68 3.92
## 11 1011302 Oranienburger Straße 8.16 10.44
## 12 1011303 Alexanderplatzviertel 11.69 19.64
## 13 1011304 Karl-Marx-Allee 23.54 36.33
## 14 1011305 Heine-Viertel West 10.07 17.00
## 15 1011306 Heine-Viertel Ost 9.67 15.43
## 16 1011401 Invalidenstraße 6.29 7.70
## 17 1011402 Arkonaplatz 4.46 3.53
## 18 1022101 Huttenkiez 32.74 66.92
## 19 1022102 Beusselkiez 29.07 51.86
## 20 1022103 Westhafen 57.30 60.94
## 21 1022104 Emdener Straße 23.48 40.66
## 22 1022105 Zwinglistraße 29.85 50.84
## 23 1022106 Elberfelder Straße 12.41 16.22
## 24 1022201 Stephankiez 20.31 34.06
## 25 1022202 Heidestraße 23.48 33.33
## 26 1022203 Lübecker Straße 36.69 49.52
## 27 1022204 Thomasiusstraße 11.10 13.83
## 28 1022205 Zillesiedlung 36.66 37.44
## 29 1022206 Lüneburger Straße 12.11 13.47
## 30 1022207 Hansaviertel 8.86 14.65
## 31 1033101 Soldiner Straße 46.52 65.26
## 32 1033102 Gesundbrunnen 36.76 54.68
## 33 1033201 Brunnenstraße 45.94 54.89
## 34 1033202 Humboldthain Süd 33.11 45.58
## 35 1033203 Humboldthain Nordwest 41.51 60.51
## 36 1044101 Rehberge 28.63 48.26
## 37 1044102 Schillerpark 29.37 45.61
## 38 1044103 Westliche Müllerstraße 27.26 53.81
## 39 1044201 Reinickendorfer Straße 42.91 64.20
## 40 1044202 Sparrplatz 26.80 50.13
## 41 1044203 Leopoldplatz 37.25 58.92
## 42 2010101 Askanischer Platz 28.38 39.81
## 43 2010102 Mehringplatz 41.46 51.81
## 44 2010103 Moritzplatz 49.00 59.77
## 45 2010104 Wassertorplatz 45.07 59.56
## 46 2020201 Gleisdreieck/Entwicklungsgebiet 4.55 6.45
## 47 2020202 Rathaus Yorckstraße 19.39 27.30
## 48 2020203 Viktoriapark 13.26 17.78
## 49 2020204 Urbanstraße 14.21 19.21
## 50 2020205 Chamissokiez 13.34 18.42
## 51 2020206 Graefekiez 20.68 26.98
## 52 2030301 Oranienplatz 33.43 49.48
## 53 2030302 Lausitzer Platz 26.05 33.86
## 54 2030401 Reichenberger Straße 20.36 28.02
## 55 2030402 Wrangelkiez 18.89 23.93
## 56 2040501 Barnimkiez 25.00 35.70
## 57 2040502 Friedenstraße 14.66 23.31
## 58 2040503 Richard-Sorge-Viertel 10.18 13.98
## 59 2040701 Andreasviertel 21.96 38.55
## 60 2040702 Weberwiese 8.51 8.65
## 61 2040703 Wriezener Bahnhof/Entwicklungsgebiet 30.24 58.54
## 62 2050601 Hausburgviertel 12.94 17.19
## 63 2050602 Samariterviertel 11.66 13.34
## 64 2050801 Traveplatz 15.85 23.06
## 65 2050802 Boxhagener Platz 9.40 11.78
## 66 2050803 Stralauer Kiez 11.81 18.47
## 67 2050804 Stralauer Halbinsel 4.19 2.81
## 68 3010101 Bucher Forst 14.40 12.61
## 69 3010102 Buch 28.01 31.88
## 70 3010104 Lietzengraben 33.67 22.22
## 71 3020203 Blankenfelde 6.66 5.57
## 72 3020209 Niederschönhausen 7.24 6.42
## 73 3020210 Herthaplatz 10.25 11.11
## 74 3020307 Buchholz 8.05 9.99
## 75 3030405 Karow Nord 16.56 20.40
## 76 3030406 Alt-Karow 3.64 2.57
## 77 3030711 Blankenburg 2.86 1.96
## 78 3030715 Heinersdorf 7.55 7.46
## 79 3030716 Märchenland 5.08 6.40
## 80 3040508 Rosenthal 5.52 5.93
## 81 3040512 Wilhelmsruh 7.34 9.09
## 82 3040513 Schönholz 5.22 3.36
## 83 3040614 Pankow Zentrum 10.59 9.21
## 84 3040818 Pankow Süd 11.01 12.11
## 85 3050919 Gustav-Adolf-Straße 21.93 28.80
## 86 3050920 Weißer See 10.68 9.83
## 87 3050923 Weißenseer Spitze 11.62 11.09
## 88 3050924 Behaimstraße 15.55 14.53
## 89 3050925 Komponistenviertel Weißensee 13.37 13.39
## 90 3051017 Rennbahnstraße 15.55 17.91
## 91 3051021 Buschallee 18.16 25.76
## 92 3051022 Hansastraße 16.28 24.96
## 93 3061126 Arnimplatz 9.41 9.10
## 94 3061131 Falkplatz 6.92 5.75
## 95 3061227 Humannplatz 6.97 6.13
## 96 3061228 Erich-Weinert-Straße 13.07 20.05
## 97 3061332 Helmholtzplatz 8.42 7.02
## 98 3061429 Greifswalder Straße 23.82 35.80
## 99 3061430 Volkspark Prenzlauer Berg 37.89 34.93
## 100 3061434 Anton-Saefkow-Park 10.58 8.50
## 101 3061435 Conrad-Blenkle-Straße 16.62 24.89
## 102 3061441 Eldenaer Straße 1.07 0.64
## 103 3071536 Teutoburger Platz 6.52 6.98
## 104 3071537 Kollwitzplatz 5.89 3.82
## 105 3071633 Thälmannpark 18.75 19.77
## 106 3071638 Winsstraße 7.13 6.37
## 107 3071639 Bötzowstraße 6.86 4.88
## 108 4010101 Jungfernheide 30.69 46.39
## 109 4010102 Plötzensee 35.89 77.00
## 110 4010103 Paul-Hertz-Siedlung 45.22 52.32
## 111 4020204 Olympiagelände 0.00 0.00
## 112 4020205 Siedlung Ruhleben 4.98 7.33
## 113 4020206 Angerburger Allee 8.38 13.58
## 114 4020207 Flatowallee 5.21 7.64
## 115 4020208 Kranzallee 5.69 4.81
## 116 4020209 Eichkamp 2.27 2.96
## 117 4020310 Park Ruhwald 10.45 48.28
## 118 4020311 Reichsstraße 7.90 9.49
## 119 4020312 Branitzer Platz 7.10 7.51
## 120 4020313 Königin-Elisabeth-Straße 11.90 19.64
## 121 4020314 Messegelände 0.00 0.00
## 122 4030415 Schloßgarten 21.79 34.06
## 123 4030416 Klausenerplatz 19.22 26.51
## 124 4030417 Schloßstraße 12.43 17.43
## 125 4030518 Tegeler Weg 17.00 28.10
## 126 4030519 Kaiserin-Augusta-Allee 18.67 33.19
## 127 4030620 Alt-Lietzow 13.92 21.84
## 128 4030621 Spreestadt 16.15 29.77
## 129 4030622 Richard-Wagner-Straße 16.41 22.80
## 130 4030623 Ernst-Reuter-Platz 17.31 24.87
## 131 4030724 Lietzensee 10.83 13.25
## 132 4030725 Amtsgerichtsplatz 10.29 17.08
## 133 4030726 Droysenstraße 13.82 24.81
## 134 4030827 Karl-August-Platz 14.60 22.58
## 135 4030828 Savignyplatz 8.03 13.34
## 136 4030929 Hindemithplatz 6.68 6.98
## 137 4030930 George-Grosz-Platz 7.08 10.07
## 138 4030931 Breitscheidplatz 11.33 20.21
## 139 4031032 Halensee 10.11 14.61
## 140 4041133 Güterbahnhof Grunewald 8.33 0.00
## 141 4041134 Bismarckallee 6.95 7.60
## 142 4041135 Hundekehle 3.64 2.47
## 143 4041136 Hagenplatz 4.67 4.90
## 144 4041137 Flinsberger Platz 11.36 16.41
## 145 4041238 Kissinger Straße 11.03 11.20
## 146 4041239 Stadion Wilmersdorf 43.46 25.00
## 147 4041240 Messelpark 2.79 2.52
## 148 4041241 Breite Straße 6.18 4.65
## 149 4041342 Schlangenbader Straße 15.82 18.66
## 150 4041343 Binger Straße 8.09 8.65
## 151 4041344 Rüdesheimer Platz 7.20 6.94
## 152 4051445 Eisenzahnstraße 15.37 23.83
## 153 4051446 Preußenpark 7.62 7.88
## 154 4051447 Ludwigkirchplatz 10.88 14.89
## 155 4051448 Schaperstraße 11.28 17.22
## 156 4051549 Rathaus Wilmersdorf 19.50 31.79
## 157 4051550 Leon-Jessel-Platz 11.07 17.15
## 158 4051551 Brabanter Platz 13.60 22.38
## 159 4051652 Nikolsburger Platz 9.92 13.09
## 160 4051653 Prager Platz 10.36 12.13
## 161 4051654 Wilhelmsaue 10.38 13.03
## 162 4051655 Babelsberger Straße 12.17 13.21
## 163 4051656 Hildegardstraße 8.19 9.86
## 164 4061757 Forst Grunewald 0.00 0.00
## 165 5010101 Hakenfelde Nord 18.25 24.16
## 166 5010102 Goltzstraße 26.10 39.79
## 167 5010103 Amorbacher Weg 14.12 18.33
## 168 5010204 Griesingerstraße 38.94 37.30
## 169 5010205 An der Tränke 5.26 9.01
## 170 5010206 Gütersloher Weg 40.41 51.34
## 171 5010207 Darbystraße 42.82 49.57
## 172 5010208 Germersheimer Platz 36.67 50.75
## 173 5010209 An der Kappe 16.05 22.76
## 174 5010310 Eckschanze 31.06 43.47
## 175 5010311 Eiswerder 39.68 48.25
## 176 5010312 Kurstraße 43.94 59.70
## 177 5010313 Ackerstraße 30.51 42.67
## 178 5010314 Carl-Schurz-Straße 28.37 41.44
## 179 5010339 Freiheit 15.40 7.84
## 180 5020415 Isenburger Weg 3.55 4.90
## 181 5020416 Am Heideberg 10.84 10.46
## 182 5020417 Staakener Straße 14.25 18.41
## 183 5020418 Spandauer Straße 21.54 23.13
## 184 5020419 Magistratsweg 36.66 46.97
## 185 5020420 Werkstraße 11.05 15.24
## 186 5020521 Döberitzer Weg 7.22 8.59
## 187 5020522 Pillnitzer Weg 52.75 60.68
## 188 5020523 Maulbeerallee 70.34 70.10
## 189 5020524 Weinmeisterhornweg 12.29 17.11
## 190 5020625 Borkumer Straße 24.64 39.33
## 191 5020626 Adamstraße 25.54 37.42
## 192 5020627 Tiefwerder 33.66 46.07
## 193 5020628 Graetschelsteig 27.73 37.76
## 194 5020629 Börnicker Straße 6.45 4.69
## 195 5030730 Zitadellenweg 18.06 23.83
## 196 5030731 Gartenfelder Straße 33.58 46.14
## 197 5030832 Rohrdamm 26.50 39.10
## 198 5030833 Motardstraße 24.36 55.77
## 199 5040934 Alt-Gatow 13.40 20.88
## 200 5040935 Groß-Glienicker Weg 16.37 9.09
## 201 5040936 Jägerallee 3.37 2.87
## 202 5040937 Kladower Damm 1.63 1.16
## 203 5040938 Kafkastraße 3.75 5.05
## 204 6010101 Fichtenberg 4.51 4.07
## 205 6010102 Schloßstr. 10.22 12.95
## 206 6010103 Markelstraße 7.00 7.78
## 207 6010204 Munsterdamm 12.59 16.09
## 208 6010205 Südende 12.45 17.92
## 209 6010206 Stadtpark 10.85 13.77
## 210 6010207 Mittelstraße 11.46 16.37
## 211 6010208 Bergstraße 10.06 11.33
## 212 6010209 Feuerbachstraße 10.67 14.11
## 213 6010210 Bismarckstraße 11.84 15.31
## 214 6020301 Alt-Lankwitz 12.56 16.70
## 215 6020302 Komponistenviertel Lankwitz 7.81 11.13
## 216 6020303 Lankwitz Kirche 17.50 24.18
## 217 6020304 Kaiser-Wilhelm-Straße 16.52 22.69
## 218 6020305 Gemeindepark Lankwitz 17.53 23.38
## 219 6020306 Lankwitz Süd 8.76 12.53
## 220 6020407 Thermometersiedlung 31.20 42.81
## 221 6020408 Lichterfelde Süd 17.84 24.09
## 222 6020409 Königsberger Straße 11.13 15.72
## 223 6020410 Oberhofer Platz 7.12 9.52
## 224 6020411 Schütte-Lanz-Straße 12.80 19.76
## 225 6030501 Berlepschstr. 8.49 11.80
## 226 6030502 Zehlendorf Süd 13.19 18.42
## 227 6030503 Zehlendorf Mitte 7.24 9.09
## 228 6030504 Teltower Damm 4.68 4.66
## 229 6030605 Botanischer Garten 8.38 9.13
## 230 6030606 Hindenburgdamm 12.71 15.92
## 231 6030607 Goerzwerke 7.26 10.99
## 232 6030608 Schweizer Viertel 6.25 6.06
## 233 6030609 Augustaplatz 9.41 11.81
## 234 6030610 Lichterfelde West 4.63 4.54
## 235 6040701 Wannsee 4.39 4.30
## 236 6040702 Düppel 5.41 5.45
## 237 6040703 Nikolassee 3.74 3.64
## 238 6040804 Krumme Lanke 2.48 1.86
## 239 6040805 Fischerhüttenstraße 4.44 4.37
## 240 6040806 Fischtal 4.75 4.94
## 241 6040807 Zehlendorf Eiche 7.13 6.60
## 242 6040808 Hüttenweg 2.68 3.12
## 243 6040809 Thielallee 1.48 1.54
## 244 6040810 Dahlem 3.47 4.02
## 245 7010101 Wittenbergplatz/Viktoria-Luise-Platz 14.33 20.39
## 246 7010102 Nollendorfplatz 26.83 44.04
## 247 7010103 Barbarossaplatz 10.93 14.90
## 248 7010104 Dennewitzplatz 26.50 38.59
## 249 7020201 Bayerischer Platz 10.89 12.90
## 250 7020202 Volkspark (Rudolf-Wilde-Park) 17.28 24.47
## 251 7020203 Kaiser-Wilhelm-Platz 17.37 24.73
## 252 7020204 Schöneberger Insel 12.07 16.71
## 253 7030301 Friedenau 7.06 7.92
## 254 7030302 Ceciliengärten 9.13 11.98
## 255 7030303 Grazer Platz 18.59 24.74
## 256 7040401 Neu-Tempelhof 17.99 22.31
## 257 7040402 Lindenhofsiedlung 20.16 24.81
## 258 7040403 Manteuffelstraße 20.16 30.32
## 259 7040404 Marienhöhe 16.11 23.61
## 260 7040405 Rathaus Tempelhof 24.68 37.23
## 261 7040406 Germaniagarten 38.72 58.14
## 262 7050501 Rathausstraße 25.41 37.59
## 263 7050502 Fritz-Werner-Straße 24.53 35.64
## 264 7050503 Eisenacher Straße 19.60 29.35
## 265 7050504 Imbrosweg 22.10 27.74
## 266 7050505 Hundsteinweg 17.54 26.02
## 267 7050506 Birnhornweg 7.37 12.31
## 268 7060601 Marienfelder Allee Nordwest 35.40 44.05
## 269 7060602 Kirchstraße 7.52 8.96
## 270 7060603 Marienfelde Nordost 28.51 40.20
## 271 7060604 Marienfelde Süd 29.00 37.41
## 272 7070701 Kettinger Straße/Schillerstraße 12.47 16.64
## 273 7070702 Alt-Lichtenrade/Töpchiner Weg 13.81 17.88
## 274 7070703 John-Locke-Straße 24.78 31.27
## 275 7070704 Nahariyastraße 47.62 58.93
## 276 7070705 Franziusweg/Rohrbachstraße 7.24 7.76
## 277 7070706 Horstwalder Straße/Paplitzer Straße 17.60 23.37
## 278 7070707 Wittelsbacherstraße 10.32 18.35
## 279 8010115 Hasenheide 15.35 13.28
## 280 8010116 Wissmannstraße 29.79 46.82
## 281 8010117 Schillerpromenade 25.24 39.61
## 282 8010118 Silbersteinstraße 39.28 60.71
## 283 8010211 Flughafenstraße 27.12 46.12
## 284 8010212 Rollberg 46.16 55.59
## 285 8010213 Körnerpark 30.46 49.66
## 286 8010214 Glasower Straße 38.53 59.05
## 287 8010301 Reuterkiez 20.43 32.23
## 288 8010302 Bouchéstraße 28.29 47.86
## 289 8010303 Donaustraße 30.07 54.30
## 290 8010404 Rixdorf 28.64 43.96
## 291 8010405 Hertzbergplatz 24.77 41.04
## 292 8010406 Treptower Straße Nord 48.13 68.40
## 293 8010407 Gewerbegebiet Ederstraße 24.04 47.83
## 294 8010508 Weiße Siedlung 63.14 66.34
## 295 8010509 Schulenburgpark 70.79 74.87
## 296 8010510 Gewerbegebiet Köllnische Heide 39.33 77.19
## 297 8020619 Buschkrugallee Nord 37.91 45.98
## 298 8020620 Tempelhofer Weg 32.98 48.68
## 299 8020621 Mohriner Allee Nord 3.97 6.17
## 300 8020622 Parchimer Allee 20.29 28.99
## 301 8020623 Ortolanweg 25.36 46.03
## 302 8020624 Britzer Garten 12.83 9.88
## 303 8020625 Handwerker-Siedlung 16.59 20.77
## 304 8020726 Buckow West 13.45 19.39
## 305 8020727 Buckow Mitte 19.90 29.12
## 306 8020728 Buckow Ost 32.91 47.07
## 307 8030829 Gropiusstadt Nord 42.21 55.63
## 308 8030830 Gropiusstadt Süd 28.84 45.50
## 309 8030831 Gropiusstadt Ost 37.54 45.67
## 310 8040932 Goldhähnchenweg 38.36 46.10
## 311 8040933 Vogelviertel Süd 14.84 26.07
## 312 8040934 Vogelviertel Nord 16.39 22.10
## 313 8041035 Blumenviertel 9.85 13.53
## 314 8041036 Zittauer Straße 8.24 11.18
## 315 8041037 Alt-Rudow 18.60 28.74
## 316 8041038 Waßmannsdorfer Chaussee 7.83 10.08
## 317 8041039 Frauenviertel 12.98 18.70
## 318 8041040 Waltersdorfer Chaussee Ost 15.02 21.27
## 319 9010101 Elsenstraße 16.28 16.82
## 320 9010102 Am Treptower Park Nord 5.56 0.00
## 321 9010201 Am Treptower Park Süd 9.33 9.20
## 322 9010202 Köpenicker Landstraße 17.27 30.83
## 323 9010301 Baumschulenstraße 13.95 19.07
## 324 9010302 Späthsfelde 4.80 7.28
## 325 9010401 Johannisthal West 13.33 20.24
## 326 9010402 Johannisthal Ost 12.38 15.36
## 327 9020501 Oberschöneweide West 27.83 42.76
## 328 9020502 Oberschöneweide Ost 22.91 31.00
## 329 9020601 Schnellerstraße 23.90 37.70
## 330 9020602 Oberspree 15.39 22.46
## 331 9020701 Adlershof West 3.67 6.74
## 332 9020702 Adlershof Ost 16.31 23.19
## 333 9020801 Spindlersfeld 14.30 16.82
## 334 9020802 Köllnische Vorstadt 30.08 41.05
## 335 9030901 Dorf Altglienicke 6.53 8.37
## 336 9030902 Wohngebiet II 36.92 48.77
## 337 9030903 Kölner Viertel 20.39 24.62
## 338 9031001 Bohnsdorf 9.28 11.90
## 339 9031101 Grünau 8.80 10.39
## 340 9031201 Karolinenhof 2.01 1.73
## 341 9031202 Schmöckwitz/Rauchfangswerder 6.25 9.47
## 342 9041301 Kietzer Feld/Nachtheide 12.77 16.75
## 343 9041302 Wendenschloß 3.43 2.91
## 344 9041401 Allende I 14.08 20.39
## 345 9041402 Allende II 11.36 16.07
## 346 9041501 Altstadt Kietz 25.34 32.20
## 347 9041601 Müggelheim 3.68 4.54
## 348 9051701 Hirschgarten 16.55 14.02
## 349 9051702 Bölschestraße 7.63 7.58
## 350 9051801 Rahnsdorf/Hessenwinkel 5.89 5.67
## 351 9051901 Dammvorstadt 11.40 11.89
## 352 9052001 Köpenick Nord 9.21 10.06
## 353 10010101 Marzahn West 33.42 43.71
## 354 10010102 Havemannstraße 33.58 42.03
## 355 10010203 Gewerbegebiet Bitterfelder Straße 76.03 58.85
## 356 10010204 Wuhletalstraße 33.00 40.88
## 357 10010205 Marzahn Ost 29.61 41.48
## 358 10010206 Ringkolonnaden 27.07 38.06
## 359 10010207 Marzahner Promenade 27.61 35.75
## 360 10010308 Marzahner Chaussee 13.08 9.15
## 361 10010309 Springpfuhl 21.73 30.45
## 362 10010310 Alt-Marzahn 21.34 28.01
## 363 10010311 Landsberger Tor 11.49 15.83
## 364 10020412 Alte Hellersdorfer Straße 45.49 47.53
## 365 10020413 Gut Hellersdorf 29.95 41.39
## 366 10020414 Helle Mitte 26.51 38.08
## 367 10020415 Hellersdorfer Promenade 44.66 56.35
## 368 10020416 Böhlener Straße 36.26 46.69
## 369 10020517 Adele-Sandrock-Straße 16.02 24.82
## 370 10020518 Schleipfuhl 32.29 39.18
## 371 10020519 Boulevard Kastanienallee 41.37 47.02
## 372 10020620 Nord II 20.80 37.08
## 373 10020621 Gelbes Viertel 38.55 45.76
## 374 10020622 Nord I 16.96 25.36
## 375 10020623 Rotes Viertel 16.84 24.12
## 376 10030724 Oberfeldstraße 2.05 2.15
## 377 10030725 Buckower Ring 16.61 18.03
## 378 10030726 Alt-Biesdorf 8.25 10.95
## 379 10030727 Biesdorf Süd 1.82 1.99
## 380 10040828 Kaulsdorf Nord 7.15 9.50
## 381 10040829 Alt-Kaulsdorf 7.86 6.92
## 382 10040830 Kaulsdorf Süd 2.22 2.19
## 383 10040931 Mahlsdorf Nord 2.32 1.63
## 384 10040932 Alt-Mahlsdorf 3.49 2.80
## 385 10040933 Mahlsdorf Süd 2.45 2.52
## 386 11010101 Dorf Malchow 8.01 12.86
## 387 11010102 Dorf Wartenberg 2.82 3.24
## 388 11010103 Dorf Falkenberg 8.42 10.84
## 389 11010204 Falkenberg Ost 37.94 46.97
## 390 11010205 Falkenberg West 28.28 38.82
## 391 11010206 Wartenberg Süd 28.39 40.26
## 392 11010207 Wartenberg Nord 30.61 38.63
## 393 11010308 Zingster Straße Ost 31.77 42.14
## 394 11010309 Zingster Straße West 36.65 49.50
## 395 11010310 Mühlengrund 18.38 23.14
## 396 11020411 Malchower Weg 23.65 31.29
## 397 11020412 Hauptstraße 21.48 27.40
## 398 11020513 Orankesee 6.21 6.75
## 399 11020514 Große-Leege-Straße 21.81 33.59
## 400 11020515 Landsberger Allee 23.63 31.69
## 401 11020516 Weiße Taube 4.31 5.16
## 402 11030617 Hohenschönhausener Straße 32.21 48.84
## 403 11030618 Fennpfuhl West 17.66 29.06
## 404 11030619 Fennpfuhl Ost 24.97 34.11
## 405 11030720 Herzbergstraße 12.65 27.15
## 406 11030721 Rüdigerstraße 15.62 20.43
## 407 11030824 Frankfurter Allee Süd 22.40 36.25
## 408 11040925 Victoriastadt 10.41 10.24
## 409 11040926 Weitlingstraße 16.90 22.82
## 410 11041022 Rosenfelder Ring 24.72 45.93
## 411 11041023 Gensinger Straße 26.01 33.02
## 412 11041027 Tierpark 26.05 36.05
## 413 11041128 Sewanstraße 22.21 34.75
## 414 11051229 Rummelsburg 13.94 11.26
## 415 11051330 Karlshorst West 8.20 10.01
## 416 11051331 Karlshorst Nord 7.15 5.13
## 417 11051332 Karlshorst Süd 3.73 2.49
## 418 12103115 Breitkopfbecken 31.29 47.23
## 419 12103116 Hausotterplatz 33.99 47.45
## 420 12103117 Letteplatz 33.62 48.26
## 421 12103218 Teichstraße 29.63 44.70
## 422 12103219 Schäfersee 32.04 49.91
## 423 12103220 Humboldtstraße 19.78 30.65
## 424 12214125 Waldidyll/Flughafensee 12.37 20.60
## 425 12214126 Tegel Süd 32.95 41.26
## 426 12214421 Reinickes Hof 29.69 44.41
## 427 12214422 Klixstraße 46.78 57.41
## 428 12214423 Mellerbogen 26.43 37.80
## 429 12214424 Scharnweberstraße 39.24 56.57
## 430 12214527 Alt-Tegel 11.99 18.68
## 431 12214528 Tegeler Forst 4.07 0.00
## 432 12224229 Konradshöhe/Tegelort 4.27 4.71
## 433 12224230 Heiligensee 4.98 5.37
## 434 12231101 Hermsdorf 5.16 5.31
## 435 12231102 Frohnau 4.04 4.38
## 436 12301203 Wittenau Süd 22.00 31.25
## 437 12301204 Wittenau Nord 11.88 19.07
## 438 12301205 Waidmannslust 9.12 9.02
## 439 12301206 Lübars 4.74 4.09
## 440 12302107 Schorfheidestraße 15.33 25.65
## 441 12302108 Märkisches Zentrum 48.78 56.83
## 442 12302109 Treuenbrietzener Straße 56.29 58.97
## 443 12302110 Dannenwalder Weg 51.87 55.38
## 444 12302211 Lübarser Straße 14.85 20.74
## 445 12302212 Rollbergesiedlung 60.85 65.68
## 446 12304313 Borsigwalde 18.18 30.41
## 447 12304314 Ziekowstraße/Freie Scholle 11.64 15.06
We can use the area ID for augmenting the planning areas with the welfare statistics. We’re joining a spatial with an ordinary dataframe, so we can use dplyr’s inner_join. Before that we can check that for each planning region we have welfare statistics information and vice versa:
setequal(bln_plan$areaid, bln_welfare$areaid)
## [1] TRUE
(bln <- inner_join(bln_plan, bln_welfare, by = 'areaid') %>%
select(-name))
## Simple feature collection with 447 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 370000.7 ymin: 5799520 xmax: 415785.3 ymax: 5837259
## Projected CRS: ETRS89 / UTM zone 33N
## # A tibble: 447 x 5
## areaid geometry areaname welfare welfare_chld
## <int> <MULTIPOLYGON [m]> <chr> <dbl> <dbl>
## 1 1011101 (((387256.6 5818552, 387323.1 58185… Stülerstra… 10.1 15.4
## 2 1011102 (((386767.5 5819393, 386768.3 58193… Großer Tie… 4.76 0
## 3 1011103 (((387952.6 5818275, 387986.7 58183… Lützowstra… 22.2 36.8
## 4 1011104 (((388847.1 5817875, 388855.5 58178… Körnerstra… 24.8 42.1
## 5 1011105 (((388129.5 5819015, 388157.1 58190… Nördlicher… 2.82 3.53
## 6 1011201 (((389845.7 5819286, 389840.9 58193… Wilhelmstr… 12.1 19.0
## 7 1011202 (((389851.2 5819673, 389847.5 58196… Unter den … 3.35 11.1
## 8 1011203 (((390147.6 5819688, 390307.9 58197… Unter den … 6.59 6.25
## 9 1011204 (((390523.4 5819192, 390705.4 58192… Leipziger … 10.3 19.9
## 10 1011301 (((389526.5 5820432, 389528.2 58204… Charitévie… 3.68 3.92
## # … with 437 more rows
Note that when joining spatial and ordinary dataframes, the order of arguments in the join function matters. If you have a spatial dataframe on the “left side” (x argument), the result will be a spatial dataframe. If you have an ordinary dataframe on the left side, the result will be an ordinary dataframe, i.e. the merged dataset loses its “spatial nature” and spatial operations won’t work with it any more (unless you convert it back to a spatial dataframe again with st_as_sf).
inner_join(bln_plan, bln_welfare, by = 'areaid') %>% class()
## [1] "sf" "tbl_df" "tbl" "data.frame"
inner_join(bln_welfare, bln_plan, by = 'areaid') %>% class()
## [1] "data.frame"
A quick plot confirms that it is similar to the one from the dashboard of the Helbig/Salomo study. I prefer using the base plot function for quick exploration of spatial data and usually only turn to ggplot2 for more advanced or “publication ready” plots. The help page for plot.sf provides some information about the arguments of this plotting function used for sf objects.
plot(bln['welfare_chld'])
hist(bln$welfare_chld,
main = 'Histogram of percentage of children under 15 years\nwhose parents receive social welfare',
xlab = '')
median(bln$welfare_chld)
## [1] 20.39
IQR(bln$welfare_chld)
## [1] 28.725
Marcel Helbig, Rita Nikolai and me collected data on school locations in East Germany from 1992 to 2015 in order to analyze the development of the network of schools in East Germany and which role private schools play in it. We created an interactive map and published the data and are planning an update with newer data (until 2020), from which will we use an excerpt. This dataset provides school locations from 2019 as longitude/latitude WGS84 coordinates which we can load and convert into a spatial dataset using st_as_sf.
(schools <- read.csv('data/grundschulen_berlin_2019.csv', stringsAsFactors = FALSE) %>%
st_as_sf(coords = c('lng', 'lat'), crs = 4326)) # EPSG 4326 is WGS84 lat/long coord.
## Simple feature collection with 435 features and 5 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 13.12773 ymin: 52.37591 xmax: 13.70305 ymax: 52.63857
## Geodetic CRS: WGS 84
## First 10 features:
## traeger priv_schule_typ name ort plz
## 1 oeff Grundschule am Arkonaplatz Berlin Mitte 10115
## 2 oeff Papageno-Grundschule Berlin Mitte 10115
## 3 oeff Kastanienbaum-Grundschule Berlin Mitte 10119
## 4 oeff Grundschule Neues Tor Berlin Mitte 10115
## 5 oeff GutsMuths-Grundschule Berlin Mitte 10179
## 6 oeff Grundschule am Brandenburger Tor Berlin Mitte 10117
## 7 oeff City-Grundschule Berlin Mitte 10179
## 8 oeff Kurt-Tucholsky-Grundschule Berlin Mitte 10559
## 9 oeff Anne-Frank-Grundschule Berlin Mitte 10557
## 10 oeff Moabiter Grundschule Berlin Mitte 10557
## geometry
## 1 POINT (13.40039 52.53684)
## 2 POINT (13.39117 52.53274)
## 3 POINT (13.4018 52.52678)
## 4 POINT (13.38033 52.52759)
## 5 POINT (13.42395 52.51642)
## 6 POINT (13.38287 52.51312)
## 7 POINT (13.40892 52.50737)
## 8 POINT (13.35303 52.53026)
## 9 POINT (13.35537 52.51854)
## 10 POINT (13.35657 52.52197)
The variable traeger encodes whether a given facility is public (“oeff”) or private (“priv”) primary school. The only other important feature will be the school’s location, represented as point geometries that use the WGS84 CRS. The Berlin planning area data uses a different CRS, namely ETRS89 / UTM zone 33N. We need to make sure that both datasets use the same CRS, otherwise spatial operations such as spatial joins will fail. WGS84 uses spherical coordinates measured in degrees and calculations with these coordinates are quite complex because they happen on a curved surface. It’s better to use a CRS that uses planar coordinates for all the following spatial operations. The ETRS89 CRS used in the Berlin planning area data uses planar coordinates measured in meters, so we should transform the school locations to this CRS using st_transform:
schools <- mutate(schools, schoolid = 1:nrow(schools), .before = 1) %>%
st_transform(crs = st_crs(bln_plan))
schools
## Simple feature collection with 435 features and 6 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 372830.3 ymin: 5803709 xmax: 411830.8 ymax: 5833408
## Projected CRS: ETRS89 / UTM zone 33N
## First 10 features:
## schoolid traeger priv_schule_typ name
## 1 1 oeff Grundschule am Arkonaplatz
## 2 2 oeff Papageno-Grundschule
## 3 3 oeff Kastanienbaum-Grundschule
## 4 4 oeff Grundschule Neues Tor
## 5 5 oeff GutsMuths-Grundschule
## 6 6 oeff Grundschule am Brandenburger Tor
## 7 7 oeff City-Grundschule
## 8 8 oeff Kurt-Tucholsky-Grundschule
## 9 9 oeff Anne-Frank-Grundschule
## 10 10 oeff Moabiter Grundschule
## ort plz geometry
## 1 Berlin Mitte 10115 POINT (391508 5821952)
## 2 Berlin Mitte 10115 POINT (390872.9 5821510)
## 3 Berlin Mitte 10119 POINT (391578.9 5820831)
## 4 Berlin Mitte 10115 POINT (390124.5 5820954)
## 5 Berlin Mitte 10179 POINT (393056.4 5819646)
## 6 Berlin Mitte 10117 POINT (390260.6 5819340)
## 7 Berlin Mitte 10179 POINT (392014.3 5818662)
## 8 Berlin Mitte 10559 POINT (388279.4 5821292)
## 9 Berlin Mitte 10557 POINT (388408.4 5819986)
## 10 Berlin Mitte 10557 POINT (388498.5 5820365)
Both datasets use the same coordinate system now, so we can plot the school locations on top of the planning areas. I will use ggplot2 this time to make a choropleth map of the welfare_chld variable and overlay that with the public and private primary school locations.
ggplot() +
geom_sf(aes(fill = welfare_chld), color = 'black', data = bln) +
geom_sf(aes(color = traeger), size = 1, alpha = 0.75, data = schools) +
scale_fill_binned(type = 'viridis', guide = guide_bins(title = '% Welfare')) +
scale_color_manual(values = c('oeff' = '#8C2F92', 'priv' = '#928C2F'),
labels = c('public school', 'private school'),
guide = guide_legend(title = '')) +
coord_sf(datum = NA) + # disable graticule
labs(title = "Public / private primary schools and poverty",
subtitle = "Choropleth map of percentage of children whose parents obtain social welfare.\nDots represent primary schools.") +
theme_minimal()
From the figure alone, it’s probably hard to assess whether there’s a pattern in the distribution of private and public schools regarding areas with higher percentage of social welfare recipients in the city. We can join the schools’ data with the socioeconomic information of the planning area they’re located in order to compare the social welfare statistics of regions around private schools with those around public schools. This can be done with a spatial join using st_join. By default, this function joins the spatial features of the first argument with features of the second argument when they intersect – in our case this means a school is linked with the planning area it’s located in. Note that the order of arguments matters here and that the spatial geometry of the first argument is retained in the resulting dataset.
(schools_plan <- st_join(schools, bln))
## Simple feature collection with 435 features and 10 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 372830.3 ymin: 5803709 xmax: 411830.8 ymax: 5833408
## Projected CRS: ETRS89 / UTM zone 33N
## First 10 features:
## schoolid traeger priv_schule_typ name
## 1 1 oeff Grundschule am Arkonaplatz
## 2 2 oeff Papageno-Grundschule
## 3 3 oeff Kastanienbaum-Grundschule
## 4 4 oeff Grundschule Neues Tor
## 5 5 oeff GutsMuths-Grundschule
## 6 6 oeff Grundschule am Brandenburger Tor
## 7 7 oeff City-Grundschule
## 8 8 oeff Kurt-Tucholsky-Grundschule
## 9 9 oeff Anne-Frank-Grundschule
## 10 10 oeff Moabiter Grundschule
## ort plz areaid areaname welfare welfare_chld
## 1 Berlin Mitte 10115 1011402 Arkonaplatz 4.46 3.53
## 2 Berlin Mitte 10115 1011401 Invalidenstraße 6.29 7.70
## 3 Berlin Mitte 10119 1011302 Oranienburger Straße 8.16 10.44
## 4 Berlin Mitte 10115 1011301 Charitéviertel 3.68 3.92
## 5 Berlin Mitte 10179 1011304 Karl-Marx-Allee 23.54 36.33
## 6 Berlin Mitte 10117 1011201 Wilhelmstraße 12.13 19.03
## 7 Berlin Mitte 10179 1011305 Heine-Viertel West 10.07 17.00
## 8 Berlin Mitte 10559 1022201 Stephankiez 20.31 34.06
## 9 Berlin Mitte 10557 1022206 Lüneburger Straße 12.11 13.47
## 10 Berlin Mitte 10557 1022206 Lüneburger Straße 12.11 13.47
## geometry
## 1 POINT (391508 5821952)
## 2 POINT (390872.9 5821510)
## 3 POINT (391578.9 5820831)
## 4 POINT (390124.5 5820954)
## 5 POINT (393056.4 5819646)
## 6 POINT (390260.6 5819340)
## 7 POINT (392014.3 5818662)
## 8 POINT (388279.4 5821292)
## 9 POINT (388408.4 5819986)
## 10 POINT (388498.5 5820365)
We can see that the schools’ data was linked with the data from the planning areas. We should also check whether there’s a school that was not located in any planning area (this may for example happen when point locations are very close to a border):
sum(is.na(schools_plan$areaid))
## [1] 0
All schools were linked with their planning region, so we can now compare the percentage of children whose parents obtain social welfare between public and private schools:
ggplot(schools_plan) +
geom_violin(aes(x = traeger, y = welfare_chld), draw_quantiles = c(0.5)) +
geom_jitter(aes(x = traeger, y = welfare_chld), alpha = 0.25) +
scale_x_discrete(labels = c('oeff' = 'public primary schools', 'priv' = 'private primary schools')) +
labs(title = 'Percentage of children whose parents obtain social welfare', x = '', y = '% welfare')
Our descriptive results indicate that the median percentage of children whose parents obtain social welfare is around six percent higher in the statistical regions around public schools than around private schools:
st_drop_geometry(schools_plan) %>%
group_by(traeger) %>%
summarise(median_welfare_chld = median(welfare_chld))
## # A tibble: 2 x 2
## traeger median_welfare_chld
## <chr> <dbl>
## 1 oeff 22.8
## 2 priv 16.8
This is an interesting descriptive result and we may continue with our spatial analysis from here. However, our current approach doesn’t consider the catchment area of a school correctly: Children from nearby planning areas will most likely visit a school but at the moment we only consider the one planning area in which a school is located. As an example, let’s zoom to school #391 “Evangelische Schule Berlin Buch” in the north of Berlin. As you can see, only considering the planning area in which this school is located omits the higher welfare rates in nearby areas:
ggplot(bln) +
geom_sf(aes(fill = welfare_chld), color = 'black') +
geom_sf(data = filter(schools, schoolid == 391), size = 3, color = 'red') +
scale_fill_binned(type = 'viridis', guide = guide_bins(title = '% Welfare')) +
coord_sf(datum = st_crs(bln), xlim = c(395e3, 401e3), ylim = c(583e4, 5837e3)) +
labs(title = 'School #391 "Evangelische Schule Berlin Buch" and\nsurrounding planning areas') +
theme_minimal()
We can improve on this by using each schools’ catchment area and then taking the weighted average of the welfare variable from the intersections with the planning areas. You may be able to get spatial data on school catchment areas from administrative sources, but this is not often the case. So how could we define such a catchment area? One possibility would be to construct a circle around each school which represents the catchment area for a certain radius. However, it’s hard to justify a certain value for that radius and the radius for such a catchment area should probably vary depending on where the school is located (smaller catchment areas in inner city schools than for schools in the outskirts).
Another approach relies on Voronoi regions. They partition the space between given points so that the Voronoi region around each point covers an area of minimal distance to that origin point. In other words: the Voronoi region around a school is the area in which all the households are located that are closest to that school. It’s a reasonable assumption that catchment areas for schools can be approximated by Voronoi regions in Berlin since children are by law assigned to schools near their place of residence. Parents can try to register their children at another school, but priority is given to children that live closer to that school.
Voronoi regions can be generated with st_voronoi, which accepts the points as MULTIPOINT geometry object. The second argument is an envelope polygon for which we’ll use the the Berlin borders. The resulting object is a GEOMETRYCOLLECTION geometry object which we pass on to st_collection_extract and st_sfc in order to transform this to geometry set object that has the same CRS as our other spatial data (ETRS89).
bln_outline <- st_union(bln$geometry) # Berlin borders
(voronois <- st_multipoint(st_coordinates(schools$geometry)) %>%
st_voronoi(bln_outline) %>%
st_collection_extract() %>%
st_sfc(crs = st_crs(bln)))
## Geometry set for 435 features
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 333829.7 ymin: 5764709 xmax: 450831.3 ymax: 5872408
## Projected CRS: ETRS89 / UTM zone 33N
## First 5 geometries:
## POLYGON ((333829.7 5824739, 333829.7 5843705, 3...
## POLYGON ((333829.7 5805022, 333829.7 5824739, 3...
## POLYGON ((373574.4 5820108, 374364.5 5821773, 3...
## POLYGON ((360399.8 5809719, 373736.3 5814453, 3...
## POLYGON ((373887.8 5764709, 333829.7 5764709, 3...
We can now plot the generated regions along with the school locations:
{
plot(bln_outline)
plot(schools$geometry, col = 'blue', pch = 19, add = TRUE)
plot(voronois, border = 'red', col = NA, add = TRUE)
}
We can see that the Voronoi regions extend beyond the borders of Berlin so we should take the intersection between the Voronoi regions and the Berlin border in order to “cut” these regions:
bln_vor <- st_intersection(voronois, bln_outline)
{
plot(bln_vor, border = 'red', col = NA)
plot(schools$geometry, col = 'blue', pch = 19, add = TRUE)
}
Let’s overlay the planning areas with the schools’ Voronoi regions to see how they differ. Blue dots represent the schools.
{
plot(bln$geometry)
plot(bln_vor, col = '#80000055', border = 'white', add = TRUE)
plot(schools$geometry, col = 'blue', pch = 19, cex = 0.25, add = TRUE)
}
The goal is now to calculate the weighted average of the welfare rate for a given school by taking into account all planning areas that the school’s Voronoi region intersects with. The weights will be determined by the intersection area between the Voronoi region and the planning areas. I will first do this with a single school only to illustrate how it works. This school will be #270 “Müggelheimer Schule” located in the south east:
(exampleschool <- schools[schools$schoolid == 270,])
## Simple feature collection with 1 feature and 6 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 409354.2 ymin: 5807970 xmax: 409354.2 ymax: 5807970
## Projected CRS: ETRS89 / UTM zone 33N
## schoolid traeger priv_schule_typ name
## 270 270 oeff Müggelheimer Schule (Grundschule)
## ort plz geometry
## 270 Berlin Treptow-Köpenick 12559 POINT (409354.2 5807970)
{
plot(bln$geometry)
plot(bln_vor, col = '#80000055', border = 'white', add = TRUE)
plot(exampleschool$geometry, col = 'red', pch = 19, add = TRUE)
}
First, we need the Voronoi region of that school. We can again apply st_join for this in order to get the Voronoi region that intersects with the school. st_join only works with spatial dataframes so we need to convert the geometry set object via st_as_sf. Note that the Voronoi regions should be the first argument in the st_join function since we want to retain the region’s geometry in the resulting dataset. We also use an inner join instead of a left join by setting left = FALSE so that the result set only contains the single Voronoi region that intersects with the school.
(examplevor <- st_join(st_as_sf(bln_vor), exampleschool, left = FALSE))
## Simple feature collection with 1 feature and 6 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 406269.6 ymin: 5805035 xmax: 413198.9 ymax: 5810714
## Projected CRS: ETRS89 / UTM zone 33N
## schoolid traeger priv_schule_typ name
## 1 270 oeff Müggelheimer Schule (Grundschule)
## ort plz geometry
## 1 Berlin Treptow-Köpenick 12559 POLYGON ((406269.6 5806871,...
We can see that the extracted Voronoi region seems to be correct:
{
plot(examplevor$geometry)
plot(exampleschool$geometry, col = 'red', pch = 19, add = TRUE)
}
The next step is to get the intersections between the planning areas and Voronoi regions, i.e. “to cut” the planning areas according to the school’s Voronoi region. We do this with the help of st_intersection, which performs the intersection between spatial objects. The result is a spatial dataframe of the six planning regions that overlap with the school’s Voronoi region:
(exampleareas <- st_intersection(bln, examplevor))
## Simple feature collection with 6 features and 10 fields
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: 406269.6 ymin: 5805035 xmax: 413198.9 ymax: 5810714
## Projected CRS: ETRS89 / UTM zone 33N
## # A tibble: 6 x 11
## areaid areaname welfare welfare_chld schoolid traeger priv_schule_typ name
## * <int> <chr> <dbl> <dbl> <int> <chr> <chr> <chr>
## 1 9031101 Grünau 8.8 10.4 270 oeff "" Mügge…
## 2 9031201 Karoline… 2.01 1.73 270 oeff "" Mügge…
## 3 9031202 Schmöckw… 6.25 9.47 270 oeff "" Mügge…
## 4 9041301 Kietzer … 12.8 16.8 270 oeff "" Mügge…
## 5 9041601 Müggelhe… 3.68 4.54 270 oeff "" Mügge…
## 6 9051801 Rahnsdor… 5.89 5.67 270 oeff "" Mügge…
## # … with 3 more variables: ort <chr>, plz <int>, geometry <GEOMETRY [m]>
{
plot(exampleareas$geometry)
plot(exampleschool$geometry, col = 'red', pch = 19, add = TRUE)
}
We can but that a little bit into perspective again and display this on the Berlin planning regions map overlayed with the schools’ Voronoi regions:
ggplot() +
geom_sf(color = 'black', data = bln) +
geom_sf(fill = NA, color = 'red', linetype = 'dotted', data = bln_vor) +
geom_sf(aes(fill = welfare_chld), color = 'black', data = exampleareas) +
geom_sf(fill = NA, color = 'red', data = examplevor) +
geom_sf(color = 'red', size = 3, data = exampleschool) +
scale_fill_continuous(type = 'viridis', guide = guide_colorbar(title = '% Welfare')) +
coord_sf(datum = NA) +
labs(title = "Berlin statistical regions and Voronoi regions of schools",
subtitle = "Highlighted school #270 with surrounding Voronoi region and planning areas intersection.") +
theme_minimal()
All that is left now for our example school is to take the weighted average of the welfare rate. The weights are the area of the planning area intersections so that planning areas with larger overlap in the Voronoi region have a higher influence on the overall average. The following shows the planning area intersections along with their area as calculated via st_area. We can see that the welfare rate of ~5% in Müggelheim will have the largest weight, followed by the ~17% rate in Kietzer Feld/Nachtheide:
cbind(exampleareas[c('areaname', 'welfare_chld')], area = st_area(exampleareas))
## Simple feature collection with 6 features and 3 fields
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: 406269.6 ymin: 5805035 xmax: 413198.9 ymax: 5810714
## Projected CRS: ETRS89 / UTM zone 33N
## areaname welfare_chld area
## 1 Grünau 10.39 318930.01 [m^2]
## 2 Karolinenhof 1.73 15937.56 [m^2]
## 3 Schmöckwitz/Rauchfangswerder 9.47 757552.06 [m^2]
## 4 Kietzer Feld/Nachtheide 16.75 6117595.30 [m^2]
## 5 Müggelheim 4.54 13633749.67 [m^2]
## 6 Rahnsdorf/Hessenwinkel 5.67 216763.29 [m^2]
## geometry
## 1 POLYGON ((406451.3 5807406,...
## 2 MULTIPOLYGON (((406681.9 58...
## 3 POLYGON ((410293.6 5805317,...
## 4 POLYGON ((408799.1 5810501,...
## 5 POLYGON ((406890.4 5806615,...
## 6 MULTIPOLYGON (((410155.4 58...
We pass these area measurements to weighted.mean (stripping the m² unit via as.numeric since weighted.mean can’t handle it) and obtain a weighted average welfare rate of ~8.4% which is quite a bit higher than the ~4.5% we get when using the former approach (linking the school with its planning area “Müggelheim”):
weighted.mean(exampleareas$welfare_chld, as.numeric(st_area(exampleareas)))
## [1] 8.362149
# former approach: linking the school with its planning area
schools_plan[schools_plan$schoolid == 270, ]$welfare_chld
## [1] 4.54
We’ll next perform these calculations for all schools. First, we find the school inside each Voronoi region using st_join as before:
(school_vor <- st_join(st_as_sf(bln_vor), schools))
## Simple feature collection with 435 features and 6 fields
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: 370000.7 ymin: 5799520 xmax: 415785.3 ymax: 5837259
## Projected CRS: ETRS89 / UTM zone 33N
## First 10 features:
## schoolid traeger priv_schule_typ
## 1 140 oeff
## 2 155 oeff
## 3 143 oeff
## 4 145 oeff
## 5 161 oeff
## 6 369 priv Waldorf
## 7 212 oeff
## 8 183 oeff
## 9 403 priv Frei
## 10 142 oeff
## name ort
## 1 Linden-Grundschule Berlin Spandau
## 2 Mary-Poppins-Grundschule Berlin Spandau
## 3 Astrid-Lindgren-Grundschule Berlin Spandau
## 4 Grundschule am Ritterfeld Berlin Spandau
## 5 Conrad-Schule (Grundschule) Berlin Steglitz-Zehlendorf
## 6 Freie Waldorfschule Havelhöhe - Eugen Kolisko Berlin Spandau
## 7 Käthe-Kollwitz-Grundschule Berlin Tempelhof-Schöneberg
## 8 Mercator-Grundschule Berlin Steglitz-Zehlendorf
## 9 Immanuel-Grundschule Berlin Spandau
## 10 Zeppelin-Grundschule Berlin Spandau
## plz geometry
## 1 13591 POLYGON ((372875.8 5823264,...
## 2 14089 POLYGON ((373855.6 5817213,...
## 3 13591 POLYGON ((373717.9 5820410,...
## 4 14089 POLYGON ((371755 5813750, 3...
## 5 14109 POLYGON ((371780.3 5810871,...
## 6 14089 POLYGON ((373736.3 5814453,...
## 7 12307 POLYGON ((389105.6 5805524,...
## 8 12207 POLYGON ((384245 5808272, 3...
## 9 13589 POLYGON ((374401.3 5825098,...
## 10 13591 POLYGON ((374299.7 5822374,...
Next we calculate the planning area intersections, their areas and weighted average of the welfare rate for each school’s Voronoi region using sapply. This computation takes some seconds to complete and in the end adds the weighted average of the welfare rate as welfare_chld variable to the schools’ Voronoi region dataset:
school_vor$welfare_chld <- sapply(school_vor$geometry, function(vor) {
# the Voronoi polygon "vor" loses the CRS during sapply -> set it here again
vor <- st_sfc(vor, crs = st_crs(bln))
areas <- st_intersection(bln, vor)
weighted.mean(areas$welfare_chld, as.numeric(st_area(areas)))
})
school_vor
## Simple feature collection with 435 features and 7 fields
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: 370000.7 ymin: 5799520 xmax: 415785.3 ymax: 5837259
## Projected CRS: ETRS89 / UTM zone 33N
## First 10 features:
## schoolid traeger priv_schule_typ
## 1 140 oeff
## 2 155 oeff
## 3 143 oeff
## 4 145 oeff
## 5 161 oeff
## 6 369 priv Waldorf
## 7 212 oeff
## 8 183 oeff
## 9 403 priv Frei
## 10 142 oeff
## name ort
## 1 Linden-Grundschule Berlin Spandau
## 2 Mary-Poppins-Grundschule Berlin Spandau
## 3 Astrid-Lindgren-Grundschule Berlin Spandau
## 4 Grundschule am Ritterfeld Berlin Spandau
## 5 Conrad-Schule (Grundschule) Berlin Steglitz-Zehlendorf
## 6 Freie Waldorfschule Havelhöhe - Eugen Kolisko Berlin Spandau
## 7 Käthe-Kollwitz-Grundschule Berlin Tempelhof-Schöneberg
## 8 Mercator-Grundschule Berlin Steglitz-Zehlendorf
## 9 Immanuel-Grundschule Berlin Spandau
## 10 Zeppelin-Grundschule Berlin Spandau
## plz geometry welfare_chld
## 1 13591 POLYGON ((372875.8 5823264,... 17.248205
## 2 14089 POLYGON ((373855.6 5817213,... 3.720224
## 3 13591 POLYGON ((373717.9 5820410,... 42.571484
## 4 14089 POLYGON ((371755 5813750, 3... 3.136074
## 5 14109 POLYGON ((371780.3 5810871,... 4.300000
## 6 14089 POLYGON ((373736.3 5814453,... 4.497095
## 7 12307 POLYGON ((389105.6 5805524,... 20.714686
## 8 12207 POLYGON ((384245 5808272, 3... 32.495663
## 9 13589 POLYGON ((374401.3 5825098,... 32.457281
## 10 13591 POLYGON ((374299.7 5822374,... 8.607028
plot(school_vor['welfare_chld'], main = 'Weighted average of percentage of children\nwhose parents receive social welfare per school Voronoi region')
We again compare public and private schools, this time with our revised calculations:
ggplot(school_vor) +
geom_violin(aes(x = traeger, y = welfare_chld), draw_quantiles = c(0.5)) +
geom_jitter(aes(x = traeger, y = welfare_chld), alpha = 0.25)
The median percentage of children whose parents obtain social welfare is still higher for public schools, but the difference is now four instead of six percent.
st_drop_geometry(school_vor) %>%
group_by(traeger) %>%
summarise(median_welfare_chld = median(welfare_chld))
## # A tibble: 2 x 2
## traeger median_welfare_chld
## <chr> <dbl>
## 1 oeff 20.9
## 2 priv 17.0